AI024
ROCm and HIP: A Detailed 10-Chapter Tutorial
Performance Engineering on AMD GPUs
Learning Objectives
- Identify architectural bottlenecks using Omniperf and ROCProfiler.
- Optimize memory access patterns to maximize HBM2e/HBM3 throughput.
- Understand wavefront scheduling and occupancy on the CDNA Compute Unit.
- Implement instruction-level optimizations for vector and matrix cores.